some terminology in learning theory.
看learning theory的时候,关注的是这个hypothesis怎么样, 而不是”specific parameterization of hypotheses or whether it is linear classification”
so we define hypothesis class H
- training error/empirical risk/empirical error of hypothesis h
- training set have size N,
- assumption 1: (one of PAC assumption)
- training examples $(x^{(i)},y^{(i)})$ are drawn iid from some probability distribution D
$\hat{e}(h) = 1/N \sum_{i=1,2…N} (e_i)$
then we can have
- training examples $(x^{(i)},y^{(i)})$ are drawn iid from some probability distribution D
generalization error
- DEF: under assum1, it is the probability that, if we now draw a new example (x, y) from the distribution D, h will misclassify it.
- have two component: bias and variance
training error
- the process of minimizing training error: empirical risk minimization (ERM)
- think of ERM as the most “basic” learning algorithm
- logistic regression is approxi of ERM
- the process of minimizing training error: empirical risk minimization (ERM)
expected train error
- by taking expectation over all possible training datasets of size N.
- which is means we train for infinite datasets and take the average, but we cant, so we estimate by say, having m training datasets in size N then avarage the training error of each set
in-sample test error
- for one given test-pair
test error
- taking expectation over the test-data(all in-sample test error)
expected test error
- average over all possible training data of size N, again we cant, so esitimate it by our limited test set